Learning parse structure of paragraphs and its applications in search

نویسنده

  • Boris A. Galitsky
چکیده

We propose to combine parse forest and discourse structures to form a unified representation for a paragraph of text. The purpose of this representation is to tackle answering complex paragraph-sized questions in a number of products and services-related domains. A candidate set of answers, obtained by a keyword search, is re-ranked by matching the sequence of parse trees of an answer with that of the question. To do that, a graph representation and learning technique for parse structures for paragraphs of text have been developed. Parse thicket (PT) as a set of syntactic parse trees augmented by a number of arcs for inter-sentence word–word relations such as co-reference and taxonomic relations is introduced. These arcs are also derived from other sources, including Speech Act and Rhetoric Structure theories. The operation of generalization of logical formulas is extended towards parse trees and then towards parse thickets to compute similarity between texts. We provide a detailed illustration of how PTs are built from parse trees, and generalized. The proposed approach is subject to evaluation in the product search and recommendation domain of eBay. com, where user queries include product names, desired features and expressions for user needs in multiple sentences. We demonstrate that search relevance is improved by PT generalization, using Bing search engine API as a baseline. We perform the comparative analysis of contribution of various sources of discourse information to the relevance. An open source plugin for SOLR is developed so that the proposed technology can be easily integrated with industrial search engines. & 2014 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Going beyond sentences when applying tree kernels

We go beyond the level of individual sentences applying parse tree kernels to paragraphs. We build a set of extended trees for a paragraph of text from the individual parse trees for sentences and learn short texts such as search results and social profile postings to take advantage of additional discourse-related information. Extension is based on coreferences and rhetoric structure relations ...

متن کامل

Improving Text Retrieval Efficiency with Pattern Struc- tures on Parse Thickets

We develop a graph representation and learning technique for parse structures for paragraphs of text. We introduce Parse Thicket (PT) as a sum of syntactic parse trees augmented by a number of arcs for inter-sentence word-word relations such as co-reference and taxonomic relations. These arcs are also derived from other sources, including Speech Act and Rhetoric Structure theories. The operatio...

متن کامل

Extending Tree Kernels Towards Paragraphs

We extend parse tree kernels from the level of individual sentences towards the level of paragraph to build a framework for learning short texts such as search results and social profile postings. We build a set of extended trees for a paragraph of text from the individual parse trees for sentences. It is performed based on coreferences and Rhetoric Structure relations between the phrases in di...

متن کامل

Matching sets of parse trees for answering multi-sentence questions

The problem of answering multi-sentence questions is addressed in a number of products and services-related domains. A candidate set of answers, obtained by a keyword search, is re-ranked by matching the set of parse trees of an answer with that of the question. To do that, a graph representation and learning technique for parse structures for paragraphs of text have been developed. Parse Thick...

متن کامل

Machine learning of syntactic parse trees for search and classification of text

We build an open-source toolkit which implements deterministic learning to support search and text classification tasks. We extend the mechanism of logical generalization towards syntactic parse trees and attempt to detect weak semantic signals from them. Generalization of syntactic parse tree as a syntactic similarity measure is defined as the set of maximum common subtrees and performed at a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Eng. Appl. of AI

دوره 32  شماره 

صفحات  -

تاریخ انتشار 2014